Induction of a Stem Lexicon for Two-level Morphological Analysis
نویسنده
چکیده
A method is described to automatically acquire from text corpora a Portuguese stem lexicon for two-level morphological analysis. It makes use of a lexical transducer to generate all possible stems for a given unknown inflected word form, and the EM algorithm to rank alternative stems. 1 M o t i v a t i o n Morphological analysis is the basis for most natural language processing tasks. Hand-coded lists used in morphological processing are expensive to create and maintain. A procedure to automatically induce a stem lexicon from text corpora would enable the creation, verification and update of broad-coverage lexica which reflect evolving usage and are less subject to lexical gaps. Such a procedure would also be applicable to the acquisition of domain-specific vocabularies, given appropriate corpora. In the following, a method is described to automatically generate a stem lexicon for two-level morphological analysis (Koskenniemi, 1983). The method, which was implemented and tested on a newspaper corpus of Brazilian Portuguese, is applicable to other languages as well.
منابع مشابه
Design and Implementation of a Software System for Detecting Orthographical or Morphological Errors in Persian Words
This paper presents a new method for analyzing words in the Persian language context to find orthographical and structural errors regardless of the meaning. This technique tokenizes each word in a statement then tries to detect the kind of word, and analyses its correctness in terms of orthography and morphology by means of a lexicon. It should be noted that some words in the Persian language h...
متن کاملDifferentiation of Human Adipose Tissue-Derived Mesenchymal Stem Cells into Insulin Producing Cells Using Minimal Differentiation Factors
Background & Aims: Type 1 diabetes, or insulin-dependent diabetes, is an autoimmune disease in which pancreatic beta cells are destroyed by the immune system. Hitherto, no definite treatment has been found for this condition. Mesenchymal stem cells (MSCs) are multipotent, self-renewing cells that have the ability to differentiate into mesodermal tissues. This ability has attracted the attention...
متن کاملHarvesting of bone marrow mesenchymal stem cells from live rats and the in vitro differentiation of bone marrow mesenchymal stem cells into neuron-like cells
In the bone marrow, there are certain populations of stem cell sources with the capacity to differentiate into several different types of cells. Ideally, cell transplants would be readily obtainable, easy to expand and bank, and capable of surviving for sufficient periods of time. Bone marrow mesenchymal stem cells (BM-MSCs) possess all of these characteristics. One of the most important benefi...
متن کاملAn Integrated System For Morphological Analysis Of The Slovene Language
The paper presents an integrated environment for morphological analysis of word-forms of the Slovene language. The system consists of a lexicon input and maintenance module, a lexicon output module for accessing lexical word forms, a two-level rule compiler and a two-level morphological analysis/synthesis unit. The basic paradigms and lexical alternations of word forms are handled by the lexico...
متن کاملA Morphological Parser For Afrikaans
The paper presents an integrated environment for morphological analysis of word-forms of the Slovene language. The system consists of a lexicon input and maintenance module, a lexicon output module for accessing lexical word forms, a two-level rule compiler and a two-level morphological analysis/synthesis unit. The basic paradigms and lexical alternations of word forms are handled by the lexico...
متن کامل